Appendix C — Assignment C

Instructions

  1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.

  2. Do not write your name on the assignment.

  3. Write your code in the Code cells of the Jupyter notebook. Ensure that the solution is written neatly enough to understand and grade.

  4. Use Quarto to print the .ipynb file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: quarto render filename.ipynb --to html. Submit the HTML file.

  5. There are 5 points for clealiness and organization. The breakdow is as follows:

  • Must be an HTML file rendered using Quarto (1.5 pts).

  • There aren’t excessively long outputs of extraneous information (e.g. no printouts of unnecessary results without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.) (1 pt)

  • There is no piece of unnecessary / redundant code, and no unnecessary / redundant text (1 pt)

  • The code should be commented and clearly written with intuitive variable names. For example, use variable names such as number_input, factor, hours, instead of a,b,xyz, etc. (1.5 pts)

  1. The assignment is worth 100 points, and is due on 29th April 2023 at 11:59 pm.

C.1 GDP of The USA

USA’s GDP per capita from 1960 to 2021 is given by the tuple T in the code cell below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on.

Code
T = (3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

C.1.1 Gaps

Use list comprehension to produce a list of the gaps between consecutive entries in T, i.e, the increase in GDP per capita with respect to the previous year. The list with gaps should look like: [60, 177, …].

(6 points)

C.1.2 Maximum gap size

Use the list developed in C.1.1 to find the maximum gap size, i.e, the maximum increase in GDP per capita.

(2 points)

C.1.3 Gaps higher than $1000

Using list comprehension with the list developed in C.1.1, find the percentage of gaps that have size greater than $1000.

(6 points)

C.1.4 Dictionary

Create a dictionary D, where the key is the year, and value for the key is the increase in GDP per capita in that year with respect to the previous year, i.e., the gaps computed in C.1.1.

(6 points)

C.1.5 Maximum increase

Use the dictionary D to find the year when the GDP per capita increase was the maximum as compared to the previous year. Use the list comprehension method.

(6 points)

Hint: […… for …. in D.items() if ……]

C.1.6 GDP per capita decrease

Use the dictionary D to find the years when the GDP per capita decreased with respect to the previous year. Use the list comprehension method.

(6 points)

C.2 Ted Talks

C.2.1 Reading data

Read the file TED_Talks.json on ted talks using the code below. You will get the data in the object TED_Talks_data. Just look at the data structure of TED_Talks_data. You will need to know how the data is structured in lists/dictionaries to answer the questions below.

Note that the data must be stored in the same directory as the notebook.

(2 points)

Code
import json
with open("TED_Talks.json", "r") as file:
    TED_Talks_data=json.load(file)

C.2.2 Number of talks

Find the number of talks in the dataset.

(2 points)

C.2.4 Mean and median views

What are the mean and median number of views for a talk? Can we say that the majority of talks (i.e., more than 50% of the talks) have less views than the average number of views for a talk? Justify your answer.

(6 points)

C.2.5 Views vs average views

Do at least 25% of the talks have more views than the average number of views for a talk? Justify your answer.

(4 points)

C.2.6 Confusing talks

Find the headline of the talk that received the highest number of votes in the Confusing category.

(8 points)

C.2.7 Fascinating talks

Find the headline and the year_filmed of the talk that received the highest percentage of votes in the Fascinating category.

\[\text{Percentage of } \textit{Fascinating} \text{ votes for a ted talk} = \frac{Number \ of \ votes \ in \ the \ Fascinating \ category \ }{Total \ votes \ in \ all \ categories}\]

(10 points)

C.3 Poker

The object deck defined below corresponds to a deck of cards. Estimate the probability that a five card hand will be:

  1. Straight

  2. Three-of-a-kind

  3. Two-pair

  4. One-pair

  5. High card

You may check the meaning of the above terms here.

(25 points)

Hint:

Estimate these probabilities as follows.

  1. Write a function that accepts a hand of 5 cards as argument, and returns relevant characterisitics of a hand, such as the number of distinct card values, maximum occurences of a value etc. Using the values returned by this function (may be in a dictionary), you can compute if the hand is of any of the above types (Straight / Three-of-a-kind / two-pair / one-pair / high card).

  2. Randomly pull a hand of 5 cards from the deck. Call the function developed in (1) to get the relevant characteristics of the hand. Use those characteristics to determine if the hand is one of the five mentioned types (Straight / Three-of-a-kind / two-pair / one-pair / high card).

  3. Repeat (2) 10,000 times.

  4. Estimate the probability of the hand being of the above five mentioned types (Straight / Three-of-a-kind / two-pair / one-pair / high card) from the results of the 10,000 simulations.

You may use the function shuffle() from the library random to shuffle the deck everytime before pulling a hand of 5 cards.

You don’t need to stick to the hint if you feel you have a better way to do it. In case you have a better way, you can claim 10 bonus points for this assignment.

Code
deck = [{'value':i, 'suit':c}
for c in ['spades', 'clubs', 'hearts', 'diamonds']
for i in range(2,15)]